Michael
Steptoe, VADER Lab, Arizona State University, msteptoe@mainex1.su.edu PRIMARY
Robert Krueger,
VIS, University of Stuttgart, robert.krueger@vis.uni-stuttgart.de
Yifan Zhang, VADER Lab, Arizona yifan.zhang@asu.edu
Xing Liang, VADER Lab, Arizona State University, xliang22@asu.edu
Rolando
Garcia, VADER Lab, Arizona State University, rsgarci1@asu.edu
Sagarika Kadambi,
VADER Lab, Arizona State University, skadambi@asu.edu
Wei Luo,
VADER Lab, Arizona State Univeristy, wluo23@asu.edu
Thomas Ertl, VIS, University of Stuttgart, Thomas.ertl@vis.uni-stuttgart.de
Ross Maciejewski, VADER Lab, Arizona State University, rmacieje@asu.edu
Student Team: YES
Approximately
how many hours were spent working on this submission in total?
~500 hours between all
participants
May
we post your submission in the Visual Analytics Benchmark Repository after VAST
Challenge 2015 is complete? Yes
Video Download
Video:
https://youtu.be/LUZr3qEt7Qo
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Questions
For each of the following questions, consider both the
movement and communications data.
GC.1 – Scott is not a paying customer and does not
have an ID. Describe Scott Jones’ activities in the park during the three-day
weekend. Who does he spend most of his time with? When does he arrive? When does
he leave? What route does he follow?
Limit your
response to no more than 10 images and 1000 words.
We have
developed a visual analytics interface for exploring the spatiotemporal
communications data in Dinofun World during the
weekend of Scott Jones’ visit. Our
system consists of three views: the exploratory view; the report view, and; the
communications view. The Exploratory View (Figure 1) has four
primary components:
1.
Analytics Interface:
a.
ID selection: A user can input a list of visitor IDs and view
their trajectories on the map (2). By
pressing play, the trajectories are animate and the communications data is also
visualized as points on the map that appear at the time of the call and then
fade.
b.
Visual query: A user can select time and location intervals
on the calendar view (2) and create a visual query with logic operators
(AND/OR/NOT). This query will return,
for example, the IDs of all patron that were at attraction 38 at 4PM and at
attraction 45 at 9PM on Friday. This is
our primary feature for finding users that were at locations of interest at
particular times. IDs and trajectories
returned from the query are plotted in the trajectory view.
c.
Cluster: All visitor trajectories can be clustered
using a Levenshtein distance function and hierarchical
clustering. If a tolerance of 0 is
selected, the resultant clusters consist of the IDs with identical trajectories
(in terms of locations visited at the same time). Reducing the tolerance provides fuzzier
clusters (i.e., they have visited ‘mostly’ the same locations at the same time
during their stay). Groups found are plotted in the trajectory view where the
trajectory is shown to be the most representative trajectory of the group.
d.
Outliers: The larger the smallest Levenshtein
distance is, the more unique a trajectory is.
This slider returns the top n-IDs with the largest distance. The IDs and trajectories are plotted in the
trajectory view.
e.
Calendar aggregation: This controls how the rows in the calendar
view are sorted (by region, attraction or ride type) as well as the data
plotted in the cells of the calendar view (data can be the number of visitors
at a ride, the number of sent/received/external/unique calls sent from a ride
at time t).
2.
Map View: This view shows the
trajectories of selected IDs and also animates their movements over time
showing communications during animation.
This view is also linked to the trajectory view, thus when brushing over
a section of the trajectory, that movement segment is plotted on the map. A heat map view coloring each pixel by the
number of times a visitor stepped there can also be displayed.
3.
Calendar View: Each row represents an attraction in the park
and each cell is colored based on an aggregation chosen from control 1e. Data
can be viewed for each day, or all three days are aggregated in the ‘any day’
view. The ‘every day’ view shows the
counts of IDs that were at the same place at the same time every day. Each cell represents 30 minutes of time.
4.
Trajectory View: This is a pixel based representation. Each cell is a 5 minute time interval that is
colored based on the location a user is at in the park.
5.
Distribution View: This provides a histogram view of the number
of sent/received/external/unique calls made during a time period. The y-axis is
the number of IDs and the x-axis is the number of communications made (a
histogram of call distribution by ID).
Users can click a bin to see all the IDs in a bin, in this way we can
find those IDs with unusually large amounts of communications.
Figure 1: Exploratory View
Once a user
identifies IDs of interest, they can use the Report View to create side-by-side comparisons of ID trajectories,
communication networks, or explore clusters of IDs (created by Figure 1 –
1b). Feature vectors (such as number of
thrill rides visited), communication metrics (centrality) and trajectories are
all shown. Figure 2 shows several of the
possible views that can be explored. By double clicking an ID in an image, the
user can retrieve all other IDs that sent or received communications from that
ID as well to allow for quick exploration of the communication network.
Figure 2: Report View
Our first
goal is to determine a time and location that Scott is known to be at. We were told that there are two shows a day
in the park, and the calendar view reveals these to be occurring at the Grinosaurus Stage. The first
show starts around 9:30AM and finishes around 10AM Figure 3-1. The second show, starts around 2:30 PM and
ends around 3PM. Figure 3-2
We quickly see that on the last day the second show does not take
place. We hypothesize that the vandalism must have been discovered after the
first show on Sunday, resulting in the cancellation of the second show.
Figure 3: Finding Scott's shows
Since we
know when and where Scott was at certain times on Friday-Sunday, we can create a
visual query that requests all IDs that were at Grinosaurus
Stage on Friday, Saturday and Sunday showtimes. This
query (Figure 4 1-2) reveals a set of 8 IDs.
These IDs follow identical paths through the park (Figure 4 – 3,4) from the hotel to the stage and back. We hypothesize that the soccer star spends
his time with this staff that accompany him to the stage and back a few minutes
before the shows start.
Figure 4: Scott's entourage only goes to the hotel and
stage.
We also explore the communication data for these 8 IDs
and find that none of these IDs sent any communications during the
weekend. This is strange as we would
expect his handlers to be informed of the vandalism; however, it also seems to
indicate that Scott does not arrange to meet any friends in the park either.
Figure 5: Scott's entourage doesn't
talk to anyone
GC.2 –
Identify up to 8 issues with park operations during the three-day weekend. Provide a rationale for your answers.
Limit your
response to no more than 8 images and 800 words.
1. From the calendar view, we can quickly see that a variety of rides are closed at various times during the weekend. For example, Galactosaurus Rage is closed Friday from 19:30-20:00, Stone Cups is closed Friday from 20:00-20:30, and the Flying TyrAndrienkos also closes. There are several other issues such as these, but are likely normal issues and seem to get resolved rather quickly.
Figure 6: Rides are broken, kiddie land is sad.
2. On Friday between 20:00 and 23:30, there is a huge growth in the amount of sent messages (684) compared to the previous 30 minutes (59). We hypothesize that something may have occurred at the ride.
Figure 7: What's enchanting in kiddie land?
3. On Friday
between 1:30 pm and 2:30 pm many people go to the Ligament Fix-Me-Up stand. Maybe something
happened, and people got injured, we see individuals coming from attractions 1,
3, 7 and 8. However these are large
groups that travel together so it is unlikely that all of the individuals are
hurt, perhaps just one or two, but it should be investigated.
Figure 8: Injuries or just overly concerned families?
4. Some
users have missing recordings. For example sometimes there is no check-in
information, even if a visitor goes to a ride for hours. Sometimes movement
information is missing for a while and then only the last couple movements are
recorded before a visitor leaves the park. We hypothesize that the app is
unreliable sometimes. The image below shows such a case. The visitor enters the
Tyrannosaurus Rest bathroom and no other movement data or check-ins are recorded
until 8:40PM.
Figure 9: You were in there for how long?
5. In our exploratory
view, the trajectory view can be replaced with a probability view. For each ride, we can calculate the
probability of each ride they may go to next.
The arc diagram shows the most likely place to go next. Arcs on the top read from left to right, on the
bottom from left to right. Here we can see that no one is most likely to go to
Whitley’s Plushadactyl stand (attraction 43). We also see that people who visit souvenir
shops (attractions 40, 41, 44-48) are most likely to visit a thrill ride
next. These rides should offer storage
to encourage shopping and riding.
However, what this really shows is that people are not using the app the
check-in to restaurants or stores. It is currently not possible to determine
which groups of people are making purchases in the park without inferring
check-ins from movements. Thus the park
is not good at determining turnover rates for the stores, or how much time (on
average) a paying visitor spends at a store vs. a non-paying visitor. The theme
park has a financial interest in identifying its highest paying customers. We are not sure if no one goes to the Plushadactyl stand, or if sales there are quick enough to
result in waits of less than 5 minutes (our inferred check-in threshold).
Figure 10: Buy then ride? Probability plots showing where you're likely
to go next.
Figure 11: No one wants a plushadactyl
6.
If the
venue's proximity to an attraction is what determines visitor count (as opposed
to product being sold), then we would expect more visitors at Paleo Shreckwiches (36), because
Paleo Shreckwiches (36) is nearest to the most
popular attractions (Thrill rides 1,2,8); however, people seem to prefer going from Smoky Wood
BBQ (53) to thrill rides. Venue 36 is closer to thrill rides, so it will cater
to more customers. However from the arc view we know that people who like BBQ
also like thrill rides, whereas people who like sandwiches tend to go to the
beer garden (34) and Rides for everyone (30).
So Selling BBQ at venue 36 could increase visitor count.
Figure 12: Sandwich and beer, bbq and rollercoaster?
GC.3 – For
the crime, describe the following, and provide your rationale:
a.
When did the
crime occur?
b.
Where did
the crime take place?
c.
Who are the
most likely suspects in the crime?
Limit your
response to no more than 5 images and 500 words.
We hypothesize that the crime occurred between 9:45AM and 11:30AM
on Sunday at the Creighton Pavilion based on visitor patterns from previous
days.
Figure 13: Using the calendar view to
narrow in on the crime time.
To identify suspects, we create a visual query that returns all
IDs that were in the pavilion during 9:45-11:30AM for more than 5 minutes
(inferred check-ins are 5 minutes). ID:1502920 has also has a hard check-in (recorded by park) at
9:30. Figure 14 shows the calendar view for hard check-ins, and IDs with both
soft and hard check-ins during this time.
Figure 14: Suspect trajectories after
the visual query.
Exploring ID: 1502920, we look at
the communication network, Figure 15, and discover that this ID communicates
with 6 other visitors. We visualize their movement sequences and discover that
ID:461004 and ID:416790 have the exact same sequence as the initial visitor but
do not have hard check-ins to the pavilion even though their movements put them
there at 9:30.
Figure 15: Communication networks
identify more interesting suspects.
We use the Communications View (Figure 16).
Input is a list of IDs. These IDs
are represented as circles, the size of which represents the number of IDs that
are at the same location. The x-axis is
time and the y-axis corresponds to attractions.
If a group of IDs move to a different attraction, a green line is
drawn. IDs may leave/join groups,
forming new groups. Communications for
these IDs are: yellow lines showing communications that take place between two
groups; red hashes showing external communications, and; blue hashes showing
within group communication. The slope
represents the number of communications.
Figure 16 explores the 7 IDs. We see the seven visitors enter the park
together. After entering the park they split into three groups: G1 (1502920,
461004, 416790); G2 (1123214, 1350546), and; G3
(1000279, 1187909). We see G3
waiting at attraction 8 for Scott to pass. Then G3 goes to attraction 5 and
waits until Scott enters the stage (~9:30). During this time, G2 leaves the
pavilion and waits at attraction 7. G3 and G1 communicate with each other
around 10AM, and then G3 joins G2 at attraction 7 around 10:45. G1 communicates
to the merged G2 and G3 several times from 10:55 to 11:10. We hypothesize that
the crime takes place during this time period (10AM to 10:55AM). G1 and G2 then meet at attraction 6 after the
pavilion has been re-opened to the public at 11:30AM.
Figure 16: We track how the suspects
who are communicating travel together.
The above story is plausible as we know Scott has three local
friends and three suspects travel together.
Another suspect is ID:1983765. On the day of the crime, 1983765 visits the
pavilion and leaves during normal hours of operation. This person goes to the Scholtz
Express (train circling the park). This
person rides the train for two hours, all other IDs that enter the train at
this time of day ride for 20 minutes. It
is possible that this ID put their tracking device on the train, went to the
pavilion, stole Scott’s medals and then retrieved their device and left. This person has no communication data even
after spending three days at the park.
Figure 17: Train-man